Changes over time in behavioral measurements, such as drifting attentional selection abilities due to fluctuations in fatigue from trial to trial, are difficult to identify and to control for. To address this issue this vignette demonstrates the use and interpretation of by-trial basis functions. This is a relatively simple method of controlling for unknown fluctuations over time while also allowing for the recovery of those fluctuations. While there are many other extant methods that accomplish similar goals (e.g., Ashwood et al. 2022; Deboeck et al. 2009; Zhang et al. 2019), the method presented here is a good compromise between ease of use, flexibility, and power.
Using the time_basisFun_mem() function in the TEfits package (Cochrane 2020), models can be specified similarly tolme4 or brms, with a couple minor additions. The time_basisFun_mem() function is really a wrapper for lme4::lmer, lme4::glmer, or brms::brm, by augmenting an input mixed-effects model [data and formula] with basis functions over time.
Please don’t forget to cite TEfits (Cochrane 2020) if you use this software.
As an introductory example, it is possible to fit a dataset with a simple linear mixed-effects model with basis functions over time. Here we will first look at the Approximate Number System training data included in the TEfits package (originally from Cochrane et al. 2019). We will fit accuracy as predicted by a main effect of the absolute value of stimulus strength (ratio of black dots to white dots), with by-participant intercepts, by-participant slopes, and by-participant basis functions over trial number.
library(TEfits)
library(lme4)
#> Warning: package 'lme4' was built under R version 4.1.3
#> Loading required package: Matrix
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.1.3
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
#> Warning: package 'tidyr' was built under R version 4.1.3
#>
#> Attaching package: 'tidyr'
#> The following objects are masked from 'package:Matrix':
#>
#> expand, pack, unpack
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.1.3
m1 <- time_basisFun_mem(acc ~ abs(ratio) + (abs(ratio) | subID)
,data_mem = anstrain
,groupingVarName = 'subID'
,timeVarName = 'trialNum')
#> boundary (singular) fit: see help('isSingular')
mNULL <- lmer(acc ~ abs(ratio) + (abs(ratio) | subID)
,anstrain)
#> boundary (singular) fit: see help('isSingular')
summary(m1)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: acc ~ abs(ratio) + (abs(ratio) | subID) + ((0 + basis_cen_M62 |
#> subID) + (0 + basis_cen_0 | subID) + (0 + basis_cen_62 |
#> subID) + (0 + basis_cen_124 | subID) + (0 + basis_cen_186 |
#> subID) + (0 + basis_cen_248 | subID) + (0 + basis_cen_310 | subID))
#> Data: data_mem
#>
#> REML criterion at convergence: 980.6
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -2.5428 0.0040 0.2951 0.6937 1.1515
#>
#> Random effects:
#> Groups Name Variance Std.Dev. Corr
#> subID (Intercept) 0.000e+00 0.000e+00
#> abs(ratio) 6.412e-10 2.532e-05 NaN
#> subID.1 basis_cen_M62 3.797e-11 6.162e-06
#> subID.2 basis_cen_0 2.590e-09 5.089e-05
#> subID.3 basis_cen_62 2.304e-11 4.800e-06
#> subID.4 basis_cen_124 0.000e+00 0.000e+00
#> subID.5 basis_cen_186 2.940e-02 1.715e-01
#> subID.6 basis_cen_248 0.000e+00 0.000e+00
#> subID.7 basis_cen_310 3.882e-09 6.230e-05
#> Residual 1.540e-01 3.924e-01
#> Number of obs: 1000, groups: subID, 4
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 0.51573 0.02853 18.08
#> abs(ratio) 0.87381 0.08203 10.65
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> abs(ratio) -0.794
#> optimizer (nloptwrap) convergence code: 0 (OK)
#> boundary (singular) fit: see help('isSingular')
This is not a very good model specification for the data (because the predicted variable is binary, using a generalized model would be better). Still, it can be easily seen that basis functions over trial number are simple to implement. Here they also increase the T value of the fixed-effects predictor (from 10.45 to 10.65), thereby showing how accounting for variations improves power for fixed effects. This is a common observation, as shown below.
The main measurement of time considered here is trial number. By converting the single vector of positive linear trial numbers into a set of overlapping basis functions, those basis functions can be included in a mixed-effects model and can approximate arbitrary fluctuations over time (i.e., as random-effects offsets from the “main” model fits). The function TEfits::time_basisFun_df can be used to construct the basis functions:
basis_df <- time_basisFun_df(1:500, 100)
pivot_longer(basis_df,2:ncol(basis_df)) %>%
ggplot(aes(x = time_orig, y = value, color = name)) +
theme_bw() +
geom_line() +
labs(x = 'Trial Number',y='Basis function weight',color='Basis function\ntrial center')
This process is automated within model fitting using time_basisFun_mem(). As an example, we will generate some data and fit a model to it. The data will have fluctuations over time, with no other variable predicting those fluctuations except for trial number. In addition, there will be a “condition-level” binary variable “X” that will be fit with both between-subjects fixed effects and within-subject random effects
## generate variations over time
set.seed(9876)
genBasisOffsets <- function(trials,interPeakWidth){
#
dTmp <- time_basisFun_df(trials,interPeakWidth)
dTmp <- dTmp[,2:ncol(dTmp)]
#
basisRandomOffsetN <- ceiling(ncol(dTmp) / 3)
shuffleCols <- sample(1:ncol(dTmp))
#
basisRandomOffset <- rowSums(dTmp[,shuffleCols[1:basisRandomOffsetN]]) -
rowSums(dTmp[,shuffleCols[(basisRandomOffsetN+1):(2*basisRandomOffsetN)]])
#
return(basisRandomOffset)
}
nTrials <- 300
interPeakWidth <- 30
fluctBeta=.5
xVarBeta = .5
dat <- do.call('rbind'
,replicate(10,{
subjDat <- data.frame(
subID = paste0(sample(letters,5),sample(0:9,5),collapse='')
,trialNum = 1:nTrials
,xVar = sample(c(-.5,.5),nTrials,replace = T)
)
subjDat$subjBases <- genBasisOffsets(subjDat$trialNum, interPeakWidth + rnorm(1))
subjDat$yVar <- rnorm(1,0,.25) + # subject-level intercept
rnorm(nTrials) + # "true" noise
xVarBeta * rnorm(1,1,.1) * subjDat$xVar + # "X" with random subject-level offset
fluctBeta * subjDat$subjBases # Bases over time
subjDat
},simplify = F)
)
psych::pairs.panels(dat,hist.col='darkred')
Here we see that there are fluctuations over time, but upon visual inspection the outcome variable yVar is only somewhat correlated with these fluctuations, and any trends over time are not visible due to the random noise in the data.
First we will fit the model to the data we just generated, then we will look at the summary and recovered timecourse.
m2 <- time_basisFun_mem(yVar ~ xVar + (xVar | subID)
,data_mem = dat
,groupingVarName = 'subID'
,timeVarName = 'trialNum'
,basisDens = 30
)
#> boundary (singular) fit: see help('isSingular')
summary(m2)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: yVar ~ xVar + (xVar | subID) + ((0 + basis_cen_M30 | subID) +
#> (0 + basis_cen_0 | subID) + (0 + basis_cen_30 | subID) +
#> (0 + basis_cen_60 | subID) + (0 + basis_cen_90 | subID) +
#> (0 + basis_cen_120 | subID) + (0 + basis_cen_150 | subID) +
#> (0 + basis_cen_180 | subID) + (0 + basis_cen_210 | subID) +
#> (0 + basis_cen_240 | subID) + (0 + basis_cen_270 | subID) +
#> (0 + basis_cen_300 | subID) + (0 + basis_cen_330 | subID))
#> Data: data_mem
#>
#> REML criterion at convergence: 8547.4
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.4866 -0.6802 0.0175 0.6640 3.7269
#>
#> Random effects:
#> Groups Name Variance Std.Dev. Corr
#> subID (Intercept) 5.032e-02 2.243e-01
#> xVar 7.093e-03 8.422e-02 -0.47
#> subID.1 basis_cen_M30 2.319e-06 1.523e-03
#> subID.2 basis_cen_0 6.006e-01 7.750e-01
#> subID.3 basis_cen_30 1.477e-08 1.215e-04
#> subID.4 basis_cen_60 1.321e-01 3.635e-01
#> subID.5 basis_cen_90 9.989e-02 3.160e-01
#> subID.6 basis_cen_120 5.969e-09 7.726e-05
#> subID.7 basis_cen_150 2.400e-01 4.899e-01
#> subID.8 basis_cen_180 3.861e-02 1.965e-01
#> subID.9 basis_cen_210 0.000e+00 0.000e+00
#> subID.10 basis_cen_240 1.097e-01 3.312e-01
#> subID.11 basis_cen_270 4.390e-02 2.095e-01
#> subID.12 basis_cen_300 7.769e-09 8.814e-05
#> subID.13 basis_cen_330 6.117e-01 7.821e-01
#> Residual 9.842e-01 9.921e-01
#> Number of obs: 3000, groups: subID, 10
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) -0.1137 0.0778 -1.462
#> xVar 0.4836 0.0452 10.700
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> xVar -0.242
#> optimizer (nloptwrap) convergence code: 0 (OK)
#> boundary (singular) fit: see help('isSingular')
m2_fitted_timeCourse <- predict(m2,random.only=T
,re.form = as.formula(paste('~',gsub('xVar + (xVar | subID) + ','',as.character(formula(m2))[3],fixed=T)) ) )
plot(dat$trialNum,m2_fitted_timeCourse)
The recovered time course appears to be a reasonable approximation of the true change, and indeed, it correlates with the true “unknown” fluctuations at r = 0.72.
The above workflow of data simulation and fitting can be done for many combinations of trial numbers, sizes of true generating basis functions and sizes of basis functions chosen for model fitting. Below are the smoothed results of approximately 1,000 of such simulations; results will be shown for the correlation between true and recovered fluctuations, for change in fixed-effect T value, and for out-of-sample change in log-likelihood (see ?TEfits::time_basisFun_mem).
Within the typical regime (heuristically denoted by the grey dashed line) recovery appears to be fairly good, and within the area where the model basis size is similar to the generative basis size (black dashed line) it is quite high.
Power does not appear to be decreased in any case. In addition, under certain circumstances, the fixed-effect T value can be increased by quite a lot. In the context of these simulations, wherein there is always a true fixed effect in the simulated datasets, this corresponds to an increase in power in most cases.
Out-of-sample predictive likelihood is often quite good (see ?time_basisFun_mem for details). However, the improvements in predictive likelihood are not particularly high for model basis function sizes that match the sizes of generative basis functions, thereby indicating that the method of accounting for variation over time is not selective (i.e., choosing the best-fitting basis function size cannot allow the recovery of the true basis function size).